Fine-grained capturing of 3D HOI boosts human activity understanding and facilitates downstream visual tasks, including action recognition, holistic scene reconstruction, and human motion synthesis. Despite its significance, existing works mostly assume that humans interact with rigid objects using only a few body parts, limiting their scope. In this paper, we address the challenging problem of f-AHOI, wherein the whole human bodies interact with articulated objects, whose parts are connected by movable joints. We present CHAIRS, a large-scale motion-captured f-AHOI dataset, consisting of 16.2 hours of versatile interactions between 46 participants and 81 articulated and rigid sittable objects. CHAIRS provides 3D meshes of both humans and articulated objects during the entire interactive process, as well as realistic and physically plausible full-body interactions. We show the value of CHAIRS with object pose estimation. By learning the geometrical relationships in HOI, we devise the very first model that leverage human pose estimation to tackle the estimation of articulated object poses and shapes during whole-body interactions. Given an image and an estimated human pose, our model first reconstructs the pose and shape of the object, then optimizes the reconstruction according to a learned interaction prior. Under both evaluation settings (e.g., with or without the knowledge of objects' geometries/structures), our model significantly outperforms baselines. We hope CHAIRS will promote the community towards finer-grained interaction understanding. We will make the data/code publicly available.
translated by 谷歌翻译
从低光场景捕获的图像经常遭受严重的降级,包括低可视性,颜色铸造和密集的声音等。这些因素不仅影响图像质量,还会降低下游低光视图(LLV)应用的性能。已经提出了各种深度学习方法来提高低光图像的视觉质量。然而,这些方法主要依赖于重要的建筑工程来获得适当的低光模型,并且经常遭受高计算负担。此外,扩展这些增强技术以处理其他LLV仍然具有挑战性。为了部分地解决上述问题,我们建立了与架构搜索(Ruas)的RetineX-Inspired展开,一般学习框架,这不仅可以解决低光增强任务,而且还具有处理其他更具挑战性下游视觉应用的灵活性。具体而言,我们首先与展开策略建立嵌套优化制定,探索一系列LLV任务的基础原则。此外,我们构建一个可差的策略,以协同搜索RuAs的特定场景和任务架构。最后但并非最不重要的是,我们展示了如何为低级和高级LLV应用程序应用RuAs(例如,增强,检测和分割)。广泛的实验验证了Ruas的灵活性,有效性和效率。
translated by 谷歌翻译
现有的GRASP合成方法是分析或数据驱动的。前者通常仅限于特定的应用程序范围。后一个在很大程度上取决于示威,因此遭受了概括问题。例如,经过人工掌握数据训练的模型将很难转移到3指抓手。为了解决这些缺陷,我们制定了一种快速,可区分的力闭合估计方法,能够在没有任何训练数据的情况下生产具有任意手工结构的多样化和身体稳定的抓。尽管力闭合通常是掌握质量的量度,但并未被广泛用作掌握合成的优化目标,这主要是由于其高计算复杂性。相比之下,提出的可区分方法可以测试毫秒内的力闭合。在实验中,我们在6种不同的设置中验证了所提出的方法的功效。
translated by 谷歌翻译
Neural sequence models, especially transformers, exhibit a remarkable capacity for in-context learning. They can construct new predictors from sequences of labeled examples $(x, f(x))$ presented in the input without further parameter updates. We investigate the hypothesis that transformer-based in-context learners implement standard learning algorithms implicitly, by encoding smaller models in their activations, and updating these implicit models as new examples appear in the context. Using linear regression as a prototypical problem, we offer three sources of evidence for this hypothesis. First, we prove by construction that transformers can implement learning algorithms for linear models based on gradient descent and closed-form ridge regression. Second, we show that trained in-context learners closely match the predictors computed by gradient descent, ridge regression, and exact least-squares regression, transitioning between different predictors as transformer depth and dataset noise vary, and converging to Bayesian estimators for large widths and depths. Third, we present preliminary evidence that in-context learners share algorithmic features with these predictors: learners' late layers non-linearly encode weight vectors and moment matrices. These results suggest that in-context learning is understandable in algorithmic terms, and that (at least in the linear case) learners may rediscover standard estimation algorithms. Code and reference implementations are released at https://github.com/ekinakyurek/google-research/blob/master/incontext.
translated by 谷歌翻译
Real-world machine learning applications often involve deploying neural networks to domains that are not seen in the training time. Hence, we need to understand the extrapolation of nonlinear models -- under what conditions on the distributions and function class, models can be guaranteed to extrapolate to new test distributions. The question is very challenging because even two-layer neural networks cannot be guaranteed to extrapolate outside the support of the training distribution without further assumptions on the domain shift. This paper makes some initial steps toward analyzing the extrapolation of nonlinear models for structured domain shift. We primarily consider settings where the marginal distribution of each coordinate of the data (or subset of coordinates) does not shift significantly across the training and test distributions, but the joint distribution may have a much bigger shift. We prove that the family of nonlinear models of the form $f(x)=\sum f_i(x_i)$, where $f_i$ is an arbitrary function on the subset of features $x_i$, can extrapolate to unseen distributions, if the covariance of the features is well-conditioned. To the best of our knowledge, this is the first result that goes beyond linear models and the bounded density ratio assumption, even though the assumptions on the distribution shift and function class are stylized.
translated by 谷歌翻译
Sharpness-Aware Minimization (SAM) is a highly effective regularization technique for improving the generalization of deep neural networks for various settings. However, the underlying working of SAM remains elusive because of various intriguing approximations in the theoretical characterizations. SAM intends to penalize a notion of sharpness of the model but implements a computationally efficient variant; moreover, a third notion of sharpness was used for proving generalization guarantees. The subtle differences in these notions of sharpness can indeed lead to significantly different empirical results. This paper rigorously nails down the exact sharpness notion that SAM regularizes and clarifies the underlying mechanism. We also show that the two steps of approximations in the original motivation of SAM individually lead to inaccurate local conclusions, but their combination accidentally reveals the correct effect, when full-batch gradients are applied. Furthermore, we also prove that the stochastic version of SAM in fact regularizes the third notion of sharpness mentioned above, which is most likely to be the preferred notion for practical performance. The key mechanism behind this intriguing phenomenon is the alignment between the gradient and the top eigenvector of Hessian when SAM is applied.
translated by 谷歌翻译
我们经常在强大的机器学习中看到不良的权衡,其中分布(OOD)的精度与分布式(ID)的准确性不一致:通过删除伪造功能的专用技术获得的强大分类器通常具有更好的OOD,但ID较差,但ID较差。与通过ERM训练的标准分类器相比,准确性。在本文中,我们发现由ID校准的合奏(仅在ID数据上校准ID数据之后简单地整合标准和健壮的模型)优于ID和ID和OOD准确性。在11个自然分配移位数据集中,ID校准的合奏获得了两全其美的最佳:强大的ID准确性和OOD精度。我们在风格化的设置中分析了此方法,并确定了两个重要条件以使合奏执行良好的ID和OOD:(1)我们需要校准标准和可靠的模型(在ID数据上,因为OOD数据不可用),(2)OOD没有反相关的虚假特征。
translated by 谷歌翻译
现代机器学习中的一个主要挑战是理论上了解过度参数化模型的概括属性。许多现有工具依赖于\ em统一的收敛\ em(UC),该属性在拥有时保证测试损失将接近培训损失,并在一类候选模型上均匀地进行。 Nagarajan和Kolter(2019)表明,在某些简单的线性和神经网络设置中,任何统一的融合绑定都将是空置的,这是如何在UC失败的设置中证明概括的问题。我们的主要贡献是在两个这样的环境中证明了新的概括界限,一种线性和一种非线性。我们研究了Nagarajan和Kolter的线性分类设置,以及通过非线性政权中的两层神经网络学到的二次地面真实函数。我们证明了一种新类型的边距结合,表明高于某个信号到噪声阈值,在这两种设置中,任何接近最大的最大分类器几乎都不会实现测试损失。我们的结果表明,接近最大利润很重要:虽然任何实现至少达到$(1 - \ epsilon)$的模型 - 最大额度的分数很好地概括了,但分类器可实现一半的最大值。 。我们还加强了Nagarajan和Kolter的UC不可能结果,证明了\ em单方面\ EM UC的边界和经典边界界限将在接近最大的最大量化分类器上失败。我们的分析提供了有关为什么记忆可以与概括共存的洞察力:我们表明,在发生概括但UC失败的这种挑战性方案中,近乎最大的最细边缘分类器同时包含一些可概括的组件和一些可记住数据的过度拟合组件。过度拟合组件的存在足以排除UC,但是近超级余量保证存在足够的可推广组件。
translated by 谷歌翻译
强化学习(RL)的显着成功在很大程度上依赖于观察每个访问的州行动对的奖励。但是,在许多现实世界应用中,代理只能观察一个代表整个轨迹质量的分数,该分数称为{\ em轨迹方面的奖励}。在这种情况下,标准RL方法很难很好地利用轨迹的奖励,并且在政策评估中可能会产生巨大的偏见和方差错误。在这项工作中,我们提出了一种新颖的离线RL算法,称为悲观的价值迭代,奖励分解(分开),该算法将轨迹返回分解为每个步骤代理奖励,通过基于最小二乘的奖励重新分配,然后执行基于基于基于基于基于的价值迭代的迭代价值迭代的迭代迭代率关于博学的代理奖励。为了确保由分开构建的价值功能对最佳函数始终是悲观的,我们设计了一个新的罚款术语来抵消代理奖励的不确定性。对于具有较大状态空间的一般情节MDP,我们表明与过度参数化的神经网络函数近似近似能够实现$ \ tilde {\ Mathcal {o}}}(d _ {\ text {eff}}} h^2/\ sqrt {n}) $ suboftimality,其中$ h $是情节的长度,$ n $是样本总数,而$ d _ {\ text {eff}} $是神经切线核矩阵的有效维度。为了进一步说明结果,我们表明分开实现了$ \ tilde {\ mathcal {o}}}(dh^3/\ sqrt {n})$ subiptimation fi linearem mdps,其中$ d $是特征尺寸,匹配功能维度使用神经网络功能近似,当$ d _ {\ text {eff}} = dh $时。据我们所知,分开是第一种离线RL算法,在MDP总体上,轨迹奖励的效率非常有效。
translated by 谷歌翻译
我们考虑无监督的域适应性(UDA),其中使用来自源域(例如照片)的标记数据,而来自目标域(例如草图)的未标记数据用于学习目标域的分类器。常规的UDA方法(例如,域对抗训练)学习域不变特征,以改善对目标域的概括。在本文中,我们表明,对比的预训练,它在未标记的源和目标数据上学习功能,然后在标记的源数据上进行微调,具有强大的UDA方法的竞争力。但是,我们发现对比前训练不会学习域不变特征,这与常规的UDA直觉不同。从理论上讲,我们证明了对比的预训练可以学习在跨域下微调但仍通过解开域和类信息来概括到目标域的特征。我们的结果表明,UDA不需要域的不变性。我们从经验上验证了基准视觉数据集的理论。
translated by 谷歌翻译